ESQL: Make field fusion generic #137382

nik9000 · 2025-10-30T14:38:57Z

Speeds up queries like

FROM foo
| STATS SUM(LENGTH(field))

by fusing the LENGTH into the loading of the field if it has doc values. Running a fairly simple test:
https://gist.github.com/nik9000/9dac067f8ce29875a4fb0f0359a75091 I'm seeing that query drop from 48ms to 28ms. So, like, 40% faster.

More importantly, this makes the mechanism for fusing functions into field loading generic. All you have to do is implement BlockLoaderExpression on your expression and return non-null from tryFuse.

Speeds up queries like ``` FROM foo | STATS SUM(LENGTH(field)) ``` by fusing the `LENGTH` into the loading of the `field` if it has doc values. Running a fairly simple test: https://gist.github.com/nik9000/9dac067f8ce29875a4fb0f0359a75091 I'm seeing that query drop from 48ms to 28ms. So, like, 40% faster. More importantly, this makes the mechanism for fusing functions into field loading generic. All you have to do is implement `BlockLoaderExpression` on your expression and return non-null from `tryFuse`.

elasticsearchmachine · 2025-10-30T14:39:21Z

Hi @nik9000, I've created a changelog YAML for you.

carlosdelest · 2025-10-30T16:15:33Z

...java/org/elasticsearch/xpack/esql/expression/function/blockloader/BlockLoaderExpression.java

+     * "fusing" the expression into the load. Or null if the fusion isn't possible.
+     */
+    @Nullable
+    Fuse tryFuse(SearchStats stats);


Let's try to find another name - we already have Fuse as a command. ExpressionFieldLoader?

Is FusedExpression ok? Or still too indicative?

Naming... 😅

I come from staring at FUSE enough that it carries a lot of weight.

For me, this feature involves BlockLoaders. And Expressions that are applied to them. I understand that fuse means getting together those two, but it's not something I would think of immediately without more context.

I'd prefer to be overly explicit here, and call this BlockLoaderExpression or something similar that helps me bridge those two concepts together. But, naming...

carlosdelest · 2025-10-30T16:18:14Z

...lasticsearch/xpack/esql/optimizer/rules/logical/local/PushDownVectorSimilarityFunctions.java

+        BlockLoaderExpression.Fuse fuse
    ) {
-        // Only replace if exactly one side is a literal and the other a field attribute
-        if ((similarityFunction.left() instanceof Literal ^ similarityFunction.right() instanceof Literal) == false) {


Nice! It's much better to let the Expression deal with the details and make this generic 👍

…e_length

nik9000 · 2025-10-30T19:46:11Z

x-pack/plugin/esql-core/src/main/java/org/elasticsearch/xpack/esql/core/type/EsField.java

+     */
+    public boolean pushable() {
+        return true;
+    }


This bothers me. I needed this because without it we'd try to push this:

FROM foo | WHERE LENGTH(kwd) < 10

to the index. Now, we might be able to do that with a specialized lucene query. But we don't have one of those. Without those change instead what happens is:

LENGTH(kwd) becomes $$kwd$length$hash$.

We identify $$kwd$length$hash$ < 10 as pushable.

This tells us we can't push it. But it's kind of picky. If SearchStats took EsField it could check this easy enough. That might be a good solution to this.

The MultiTypeEsField is created with aggregatable=false, so that predicates on it don't get pushed down incorrectly.

Adding pushable should also work.

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

Adds special purpose `BlockLoader` implementations for the `MV_MIN` and `MV_MAX` functions for `keyword` fields with doc values. These are a noop for single valued keywords but should be *much* faster for multivalued keywords. These aren't plugged in yet. We can plug them in and performance test them in elastic#137382. And they give us two more functions we can use to demonstrate elastic#137382.

…e_length

nik9000 · 2025-10-31T19:52:07Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

+    }
+
+    public void testLengthInWhereAndEval() {
+        assumeFalse("fix me", true);


QL friends: This one looks fun!

The reason that we get duplicated reference attributes here is that when PushExpressionsToFieldLoad creates a new FunctionEsField in EsRelation, it was generated under a specific command context, and it doesn't look at the the whole query plan level. So when the same LENGTH(last_name) is referenced in multiple commands in the query, duplicated FunctionEsFields are added into EsRelation.

ResolveUnionTypes has a very similar workflow. It iterates through the entire query plan to prepare the attributes added into EsRelation

++, I'm rewriting this to look more like ResolveUnionTypes in #137392

elasticsearchmachine · 2025-10-31T19:52:43Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 · 2025-10-31T19:52:55Z

This is ready for folks to look at! I left two open questions around planning as self-comments.

nik9000 · 2025-10-31T20:20:55Z

I think I'd also like to write a really basic integration test that uses proves the fusion is happening.

nik9000 · 2025-10-31T21:03:02Z

A little test: https://gist.github.com/nik9000/3a8eb7065d20c6f36d0c71fad80d1d2c

FROM test
| STATS SUM(LENGTH(big))

With few unique values the perf goes from ~85ms to ~13ms.

If there's enough values to trigger #137217 (comment) then it's slower. So, as always, @dnhatn was right. I'll make the sort, iter, uniq bit he suggested.

carlosdelest

LGTM 💯

Some minor nits on naming (I'd really like to stay away from another FUSE word).

carlosdelest · 2025-11-03T07:35:48Z

...java/org/elasticsearch/xpack/esql/expression/function/blockloader/BlockLoaderExpression.java

+     * "fusing" the expression into the load. Or null if the fusion isn't possible.
+     */
+    @Nullable
+    FusedBlockLoaderExpression tryFuse(SearchStats stats);


Suggested change

FusedBlockLoaderExpression tryFuse(SearchStats stats);

FusedBlockLoaderExpression getFusedBlockLoaderExpression(SearchStats stats);

I've swapped this to tryPushToFieldLoading.

...lugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/L2Norm.java

carlosdelest · 2025-11-03T07:38:42Z

...va/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/FuseExpressionToFieldLoad.java

+public class FuseExpressionToFieldLoad extends OptimizerRules.ParameterizedOptimizerRule<LogicalPlan, LocalLogicalOptimizerContext> {

-    public PushDownVectorSimilarityFunctions() {
+    public FuseExpressionToFieldLoad() {


Suggested change

public FuseExpressionToFieldLoad() {

public BlockLoaderExpressionToFieldLoad() {

I'm going to use PushExpressionsToFieldLoad.

carlosdelest · 2025-11-03T07:43:15Z

...esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/BlockLoaderWarnings.java

+import org.elasticsearch.compute.operator.Warnings;
+import org.elasticsearch.xpack.esql.core.tree.Source;
+
+public class BlockLoaderWarnings implements org.elasticsearch.index.mapper.blockloader.Warnings {


I guess these are warnings that can be created when using BlockLoader function config to load values. Should we add that to the javadoc?

…expression/function/vector/L2Norm.java Co-authored-by: Carlos Delgado <[email protected]>

…e_length

fang-xing-esql · 2025-11-03T21:38:34Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

+    }
+
+    public void testLengthInWhereAndEval() {
+        assumeFalse("fix me", true);


The reason that we get duplicated reference attributes here is that when PushExpressionsToFieldLoad creates a new FunctionEsField in EsRelation, it was generated under a specific command context, and it doesn't look at the the whole query plan level. So when the same LENGTH(last_name) is referenced in multiple commands in the query, duplicated FunctionEsFields are added into EsRelation.

ResolveUnionTypes has a very similar workflow. It iterates through the entire query plan to prepare the attributes added into EsRelation

fang-xing-esql · 2025-11-03T21:39:18Z

...sql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LocalLogicalPlanOptimizerTests.java

+
+    public void testLengthInWhereAndEval() {
+        assumeFalse("fix me", true);
+        assumeTrue("requires similarity functions", EsqlCapabilities.Cap.VECTOR_SIMILARITY_FUNCTIONS_PUSHDOWN.isEnabled());


This seems work! That is an old capability.

fang-xing-esql · 2025-11-03T21:44:45Z

...a/org/elasticsearch/xpack/esql/optimizer/rules/logical/local/PushExpressionsToFieldLoad.java

-        );
-        var name = rawTemporaryName(fieldAttr.name(), similarityFunction.nodeName(), String.valueOf(arrayHashCode));
+        FunctionEsField functionEsField = new FunctionEsField(fuse.field().field(), e.dataType(), fuse.config());
+        var name = rawTemporaryName(fuse.field().name(), fuse.config().name(), String.valueOf(fuse.config().hashCode()));


Is there a particular reason that we need a hash code in the new attribute's name?

My original assumption is that we needed the name to be the same for attributes that had the same field name and similarity vector, so I originally included the vector hash as part of the name.

Alex told me that this is not necessary, so I'll remove that in #137392.

nik9000 added >enhancement :Analytics/ES|QL AKA ESQL v9.3.0 labels Oct 30, 2025

nik9000 and others added 3 commits October 30, 2025 10:39

Update docs/changelog/137382.yaml

7758ab9

[CI] Auto commit changes from spotless

47c874e

More tests

6b0fead

carlosdelest reviewed Oct 30, 2025

View reviewed changes

nik9000 added 2 commits October 30, 2025 15:42

Tests

1dd6d96

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

ce8e6aa

…e_length

nik9000 commented Oct 30, 2025

View reviewed changes

elasticsearchmachine and others added 5 commits October 30, 2025 19:49

[CI] Auto commit changes from spotless

d50a74b

Add names back

8746cfa

Merge branch 'main' into esql_fuse_length

15eb5e9

Renam

d01184b

[CI] Auto commit changes from spotless

d6897d8

nik9000 mentioned this pull request Oct 31, 2025

ESQL: improve performance - Merge functions into loaders (sometimes) #103636

Open

nik9000 requested review from alex-spies, fang-xing-esql and julian-elastic October 31, 2025 14:05

nik9000 mentioned this pull request Oct 31, 2025

Block loaders for MV_MIN and MV_MAX for keywords #137473

Open

nik9000 added 3 commits October 31, 2025 13:45

Merge branch 'main' into esql_fuse_length

e091312

More tests

538b72b

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

2fc8fd5

…e_length

nik9000 commented Oct 31, 2025

View reviewed changes

nik9000 marked this pull request as ready for review October 31, 2025 19:52

nik9000 requested a review from carlosdelest October 31, 2025 19:52

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 31, 2025

[CI] Auto commit changes from spotless

e310d4b

Merge branch 'main' into esql_fuse_length

bb81e43

carlosdelest approved these changes Nov 3, 2025

View reviewed changes

nik9000 and others added 8 commits November 3, 2025 10:24

Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/…

bb9bca2

…expression/function/vector/L2Norm.java Co-authored-by: Carlos Delgado <[email protected]>

Javadoc

d70bca9

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

5fb351c

…e_length

Rename

39ae15a

[CI] Auto commit changes from spotless

90005ca

Merge branch 'main' into esql_fuse_length

b98935c

fix

c462d07

Merge remote-tracking branch 'nik9000/esql_fuse_length' into esql_fus…

667e58a

…e_length

fang-xing-esql reviewed Nov 3, 2025

View reviewed changes

	FusedBlockLoaderExpression tryFuse(SearchStats stats);
	FusedBlockLoaderExpression getFusedBlockLoaderExpression(SearchStats stats);

	public FuseExpressionToFieldLoad() {
	public BlockLoaderExpressionToFieldLoad() {

ESQL: Make field fusion generic #137382

Are you sure you want to change the base?

ESQL: Make field fusion generic #137382

Conversation

nik9000 commented Oct 30, 2025

Uh oh!

elasticsearchmachine commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Oct 31, 2025

Uh oh!

nik9000 commented Oct 31, 2025

Uh oh!

nik9000 commented Oct 31, 2025

Uh oh!

nik9000 commented Oct 31, 2025

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants